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MULTILEVEL ENSEMBLE KALMAN FILTERING 


HaKON HOEL*, KODY J. H. LAW*, AND RAUL TEMPONE* 

Abstract. This work embeds a multilevel Monte Carlo (MLMC) sampling strategy into the 
Monte Carlo step of the ensemble Kalman filter (EnKF) in the setting of finite dimensional signal 
evolution and noisy discrete-time observations. The signal dynamics is assumed to be governed by a 
stochastic differential equation (SDE), and a hierarchy of time grids is introduced for multilevel nu¬ 
merical integration of that SDE. The resulting multilevel ensemble Kalman filter method (MLEnKF) 
is proved to asymptotically outperform EnKF in terms of computational cost vs. approximation ac¬ 
curacy. The theoretical results are illustrated numerically. 
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1. Introduction. Filtering refers to the sequential estimation of the state u 
and/or parameters p of a system through sequential incorporation of online data y. 
The most complete estimation of the state u n at time n is given by its probability 
density conditional on the observations up to the given time Tr(u n \yi ,..., y n ) [28; [3]. 
For linear Gaussian systems the analytical solution may be given in closed form, via 
an update formulae for the mean and covariance known as the Kalman filter }29j . 
However, in general there is no closed form solution. One must therefore resort to 
either algorithms which approximate the probabilistic solution by leveraging ideas 
from control theory [30i [28], or Monte Carlo methods to approximate the filtering 
distribution itself [3, [13] [12]. The ensemble Kalman filter (EnKF) [7} [T5] combines 
elements of both approaches. In the linear Gaussian case, it converges to the Kalman 
filter solution [35], and even in the nonlinear case, under suitable assumptions it 
converges [35] [33] to what one may argue is the optimal filter among those which 
incorporate the data linearly G2H320Q]- In the case of spatial models approximated 
on a numerical grid, the state space itself may become very high-dinrensional and 
even the linear solves may become intractable. Therefore, one may be inclined to use 
the EnKF filter even for linear Gaussian problems in which the solution is intractable 
despite being given in closed form on paper by the Kalman filter. 

For problems which admit hierarchies of approximations with cost inversely pro¬ 
portional to accuracy, it is natural to leverage solutions to less expensive and less 
accurate approximations in order to accelerate the convergence of the more expensive 
and more accurate approximations. This idea originated in the iterative solution of 
numerical approximation of elliptic PDE as early as [IB]. In the context of iterative 
solution of PDE, the methodology, which may be used both for solution as well as 
pre-conditioner, has become known as multigrid see [ 231 for a general reference. The 
same idea may be applied in the context of Monte Carlo approximation of random 
fields as proposed in [24] , and later studied in detail in the context of SDE in [T8] 
and PDE in [10] . There has been an explosion of recent activity since [IS], including 
for example m nu m © m- Even more recently, work is beginning to emerge 
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extending the same multilevel framework beyond ’’vanilla” Monte Carlo to its mani¬ 
festation in the context of Bayesian inference, anywhere that one has a discretization 
error inversely proportional to cost in the computation of a single sample and the 
Monte Carlo 0[M ~ X ! 2 ) rate of convergence. Examples in the context of Markov 
Chain Monte Carlo appeared in [31] [25]. To the knowledge of the authors there has 
yet to be extension of the methodology to the filtering context. As a first step, this 
work explores the extension of the EnKF to its multilevel implementation, which is 
naturally referred to as multilevel EnKF (MLEnKF). In the case of linear Gaussian 
problems, the limiting distribution is the gold-standard Bayesian posterior distribu¬ 
tion, while in the non-Gaussian case it is something else (see, e.g. S7, 30] and 
references therein for further discussion). 

The rest of the paper will be organized as follows. In section [2] the filtering 
problem will be introduced, the Kalman filter and EnKF will be reviewed, and the 
new multilevel EnKF (MLEnKF) will be introduced for the first time in sub-section 


favorable asymptotic ”cost-to-e” as the standard MLMC for a finite time horizon, 
and its mean-field limiting distribution is the filtering distribution in the linear and 
Gaussian case. In section [4] the theory is illustrated with numerical examples of 
the Ornstein-Uhlenbeck stochastic differential equation, and the geometric Brownian 
motion. Finally, conclusions and future directions are presented in section [5] 

2. Kalman filtering. Here the filtering problem will be introduced in section 
M and then the Kalman filter will be introduced for linear Gaussian state-space 
models in section |2.2[ In section |2.3[ its Monte Carlo implementation of the Kalman 
filter will be introduced, which is known as the ensemble Kalman filter (ENKF). In 
section |2.4[ a the multilevel implementation is introduced for the first time. 

2.1. General set-up. Let (f2, £,P) be a complete probability space, where fl is 
the set of all possible outcomes, 8 is the sigma algebra generated by fl and P is the 
associated probability measure. Consider the general stochastic signal evolution for 
the random variables u n A 2 —>■ R d , with d < oo, 

u n +1 = ^(Un), ( 2 . 1 ) 

for n = 0,1,..., N — 1. Given the history of a noisy signal observation 


2.4 In section G] it is proven that indeed the MLEnKF inherits almost the same 


y n = Hu n + r] n , 


where H £ R mxd anc [ ^ are j j ^ w ith r /1 ~ ./V(0,r),r £ R mxrra symmetric positive 
definite, the objective is to track the signal u n given observations Y n = Y° hs with 
Y n = (yi, j/ 2 , ■ • •, Un) and Y° hs = ( y ° hs , y% hs , ■.., y° bs ) the given observed realization 
ofY n . In other words, the aim is to approximate the random variable u n \(Y n = Y^ bs ). 
Notice that under the given assumptions this is a hidden Markov model and the 
density of the random variable we seek to approximate admits the following sequential 
structure 


7 TuAu\Y n = K° bS ) = 


^vAvn 8 K = u)Tt Un {u\Y n _i = y° b i) 


%nte bs l^-i = ^ s i) 

7T«„(w|y n -l = Y°*\) = ( Tr Un (u\u n -l = ( v \Y n -l = K°^i) du, 

Js. d 

KyAVn 8 \Yn-l = ^-i) = [ %„ K = u)ir Uri (u|Y„_i = Y^\) d U, 

JR d 


( 2 . 2 ) 
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Here, nx 1 (xi\X 2 = X 2 ) denotes the marginal probability density of A'i conditional 
that X 2 = X 2 - 

It will be assumed that d'(-) cannot be evaluated exactly, but rather only approx¬ 
imately, and that there exists a hierarchy of accuracies at which it can be evaluated 
each with its associated cost. The explicit dependence on w will be suppressed where 
confusion is not possible. In particular, we will be concerned herein with the case 
in which u n + 1 = 4'(u rl ) := it(l;w(0) = u n ) is given by the evolution at t = 1 of the 
following SDE 


du(t) = a{u{t))dt + b(u(t))dW(t + n), £€(0,1), (2.3) 

given the prescribed initial condition u( 0 ) = u n , where a : R d —> R d , b : —> R dxr , 
and W : x [0,00) —> K r is an r —dimensional Wiener process. Furthermore, a and 
b will satisfy the following conditions 


|a(u) — a(u)| + | b(u) — b(v)\ < c\u — r| for all u,v € and some c > 0. (2.4) 


This fits into the framework of (2.1). Notice that the randomness comes from the 


initial condition and the Wiener process W only, and not the coefficients, however 
it could in principle come also from the latter. Indeed if the analytical solution is 
unknown, then the system above must be approximated, leading to the hierarchy of 
approximations to d'(-). In particular, denote by a hierarchy of approxi¬ 

mations to the solution d' := 4/°° of (2.3). First some assumptions must be made. 


Assumption 1. For every p > 2, the solution operators satisfy the 

following conditions, for some 0 < < 00 depending on 

(i) ||**(«) - *»|| p < c*||u - v\\ p , 

(n) ||^( U )||?<c*(l + |H|J), 

where the following notation is introduced ||u|| p := E[|u| p ] 1,/p . 

For many numerical solvers the assumptions can be verified by application of 
Gronwall’s inequality, cf. [ 20111 ]. For notational simplicity, we consider the particular 
case in which (2.3) is autonomous , such that the coefficients on the right-hand side 
do not depend explicitly on time. Note that the results easily extend to the non- 
autonomous case, provided the given assumptions on 'll are uniform with respect to 
{\H n }^Li. The specialization is merely for notational convenience. 


2.2. Linear Gaussian case. Consider the linear instance of (2.1), in which 


^K) = Au n + 


(2.5) 


where A G R dx , and £ n are i.i.d. normal random variables with £1 ~ iV(0, £),£ G 
R dxd symmetric positive definite. This case arises when the coefficients of (2.3) are 
given by 


a(u) = < 3,1 u + ag and b(u) = bg. 


( 2 . 6 ) 


Again we suppress the possible time-dependence of the random maps 4/ and matrices 
A and E just to simplify notation. For this class of problems, given a Gaussian initial 
condition, the filtering distribution (2.2) is known to be Gaussian, and is therefore 
defined uniquely by its mean and covariance. Kalman filtering provides a two step 
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iterative procedure for computing the mean and covariance of v n +\ := u n + 1 10W — 
^n+i) S iven v n ■= u n \(Y n = l^ bs ) ~ N(fh n , C n ) where 

fh n := E [u n \Y n = F n obs ] and C n := E [_{u n - fh n )(u n - m n f\Y n = F° bs ] . 


The classical Kalman filter consists of a two-step formula which maps the distri¬ 
bution of v n to the distribution of v n +i. In particular, the map from ( rh n ,C n ) to 
C n +i) is described by the following two prediction equations and two update 
equations 


m n+ 1 = Afh n , 

m n -\-1 — (-f Kn+lH'jTthn+l T Rn-f-iyn-j-1, 


C n+ 1 = AC n A T + E 
C'n+l = (I — K n+l H)C n+1 . 


where the Kalman gain K n+ i is given by 

I< n+1 = C n+1 H T S-l v 


S n+1 =T + HC n+1 H J . 


Remark 1. When observations are in a lower dimension than the true signal, 
the iterations are more efficiently computed by introducing d n +± = y!^ 1 — Hm n +1 and 
updating the mean as follows 


— m n -|_i T 

It is nonetheless instructive to observe the alternative form presented above, as it is 
the properties of the operators I — K n+ iH which are responsible for stability of the 
algorithm 

2.3. EnKF. EnKF uses an ensemble of particles to estimate means and covari¬ 
ance matrices appearing in the Kalman filter, however the framework can be general¬ 
ized to non-Gaussian models. Let v n ^ and v Hi i respectively denote the prediction and 
update of the i -th particle at simulation time t n = n. One EnKF two-step transition 
consists not of the propagation of a mean and covariance as in the original Kalman 
filter, but instead the propagation of an ensemble {v n ,i}fLi ^ f] This 

procedure consists nonetheless in the prediction and update steps. In the prediction 
step, M particle paths are computed over one interval, i.e., 


v n+1 (u>i) = 'S>(v n (u) i ),oj i ) (2.7) 

for i = 1,..., M, where v n (iOi) := v n ,i denotes a realization corresponding to the event 
sample of the random variable v n : It —> R d , and \E'(-, uf) signifies the corresponding 
realization of the map for a given initial condition. Indeed the notation for random 
variable realizations, e.g. and £ ?l (ca t ), will be used interchangeably where confusion 
is not possible. The impetus for introduction of the latter notation will become 
apparent in the next section. For this presentation it suffices to assume a single 
infinite precision map, however there indeed may also be numerical approximation 
errors, i.e. may be used in place of T for some satisfactory resolution L. The 


1 Due to the implicit linear and Gaussian assumptions underlying the formulation, one may 
determine that it is reasonable to summarize the ensemble in its sample mean and covariance and 
indeed this is often done. In this case, one may construct a Gaussian from the empirical statistics 
and resample from that. 
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prediction step is completed by using the particle paths to compute sample mean and 
covariance: 


m“+i = 4K+i] 

C*+i = CovmK+i] 

where the following notations are introduced 

1 M 

e m[v\ ■= jj (2.8) 


Cov M [u,v] := E m [uv t ] - E M [u](E M [v]) T , (2.9) 

as well as the shorthand Covm[u) '■= Co vm[u,u]. The update step consists of 

computing (1) auxiliary matrices 

= HC^H t + r and = C™H T (S™ 

and (2) measurement corrected particle paths for i = 1, 2,..., M, 

Vn+l,i = Vn+l d" r ln+ l,ii 

Vn+1 ,i = (I~ K^{H)v n+hi + 


where {f?ra+i,i}££i are i.i.d. with ij n +i,i ~ iV(0,r). This last procedure may appear 
somewhat ad-hoc. Indeed it was originally introduced in [7] to correct the statistical 
error induced in its absence in implementations following the original formulation of 
the ensemble Kalman filter in [2] . It has become known as the perturbed observation 
implementation. Due to the form of the update, all ensemble members are correlated 
to one another after the first update. So, the ensemble is no longer Gaussian after the 
first update. The measurement corrected sample mean and covariance, which need 
not be computed, would be given by: 

™n+i = E M [v n + 1 ], 

C^+i = Co v M [v n+1 \. 


The sample empirical distribution is defined by 




MC 

n 


l 

M 


M 

y ^v n (u/i)> 

i—1 


( 2 . 10 ) 


and, for <p : —► K, the following shorthand notation is introduced G (<p) = 

f ipdgff c = Em[v u ]- It was shown in [SB] ES] that if U/ is of the form (2.5) and 
E|i>o| p < oo for all p > 2, then for all Lipschitz ip : —>■ R. and all p > 2, 


(E| ^ G (p)-p n (p)\v) 1,P <M-^, 


( 2 . 11 ) 


where g n is the filtering distribution. The notation f(M ) < g(M ) here is used to 
denote f(M) = 0(g(M)). 
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2.4. Multilevel EnKF. MLEnKF computes particle paths on a hierarchy of 
accuracy levels, in this case given by increasing refinement of the temporal discretiza¬ 
tion. Let v^, vi, respectively denote the prediction and update of a particle on solution 
level £ at simulation time t n . A solution on level i is computed by the numerical in¬ 
tegrator i4 +1 = Furthermore, let the difference operator for level £ be given 

by 


A e v n (u) := 




if £ = 0, 


u„(w) - 1 (w), else if £ > 0. 


( 2 . 12 ) 


Then the transition from approximation of the distribution of v n to the distribution 
of v n +\ in the MLEnKF framework consists of the predict/update step of generating 
pairwise coupled particle realizations on a set of levels £ = 0,1,..., L. However, it is 
important to note that here one has correlation between pairs and also between levels 
due to the update, unlike the standard MLMC in which one has i.i.d. pairs. This 
point will be very important, and we return to it in the following section. 

Similarly to the standard EnKF, the MLEnKF transition is between multilevel 


ensembles {[v e n {uj ti ),v e n 1 (u e ,i)]Z\}e=o ^ {[^+i(^,i)>^+i(^,i)]^=i}f=o> with the 


1 M t iL 


k := 0 for all k for ease of notation. This consists, as for EnKF, of 


convention that v 

the predict and update steps. In the predict step, particle paths are first computed 
on a hierarchy of levels. That is, the particle paths are computed one step forward by 


vi+\ {ue, i ) = 'S> i 1 (v e n 
v l+i (w,i) = 


(2.13) 


for the levels £ = 0,1,..., L and level particles * = 1,2,..., M( (where for convenience 
we introduce the convention that v := 0). Here the introduction of noise in the 
second argument of the v I j/ are correlated only within pairs, and are otherwise inde¬ 
pendent. Thereafter, sample mean and covariance matrices are computed as a sum 
of sample moments over all levels: 


™n+l =J2 E M e [&e v n+ l(w<,.)], 

^=0 

L 

Cn+1 = H CoVM ^ V ™+ l( W A-)] - Cov Mf bl+ 1 i(w£,.)], 


e=o 


where we recall the sample moment notation (2.8) and (2.9). 


It is necessary for stability of the algorithm that the sample covariance appearing 
in the denominator of the gain is positive semi-definite, a condition which is not 
guaranteed for multilevel estimators. This will therefore be imposed in the algorithm. 
It would be of independent interest to devise multilevel estimators which preserve 
positivity without such imposition. Let 


pML _ 


d 

E 

k=1 


A kqkql 


denote the eigenvalue decomposition of the symmetric multilevel covariance. Notice 
that the condition minfc(Afc) > 0 may not hold. Define 


4 ML = E X kQkql- 

fc=l;Afc >0 


(2.14) 
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It is worth noting that this is not the only way to do this, and it may be possible to 
use a less invasive and/or or less expensive method to guarantee non-negativity of the 
covariance. For example, banding [5], shrinkage |3Bj, thresholding [3], or localization 
[I] are some prospective alternatives. In particular, it will be necessary to consider 
such alternatives as the dimension grows and the cost of factorizing C^ L becomes 
a dominant consideration, but this is outside the scope of the present work. In the 
update step the multilevel Kalman gain is defined as follows 


77ML _ tt I / oiviL/ \ - 

Jx n+ 1 — u n+ l n Wn+ 1) 


where S'^ L 1 = H J + T. 


y n+1 1 


(2.15) 


Next, all particle paths are corrected according to measurements and perturbed ob¬ 
servations are added: 


= v obs + t/ 
£m+l,i i/n+1 ' '/n+l,i> 


= (I~ + K^yi +hi , 

vi+iM = (I- + K^yi +hi , 


(2.16) 


where {Vn+i i}f^i are i-i-d. with 1 ~ iV(0, T). It is in this step precisely that the 
pairs all become correlated with one another and the situation becomes significantly 
more complex than the i.i.d. case. After the first update, this correlation propa¬ 


gates forward through (2.13) to the next observation time via this ensemble. This is 


the conclusion of the update step of the MLEnKF, and this multilevel ensemble is 


subsequently propagated forward to the next prediction time via (2.13). 


The multilevel sample mean and covariance (in the case that (2.14) has not modi¬ 


fied the covariance, i.e. it has all non-negative eigenvalues without truncation) of this 
multilevel ensemble are given by: 


= '^2E Mt [AiV n +i(ue,. 
t=o 


= (I- K^H)m^ + [E Mo [y° +h . - y°%] + y° b + \] 


(2.17) 


C r+1 = (I- +iC^CovM 0 [j£ +1 ,. 

= (I- K^H)C^ + [Covm 0 [y° n+1 , - y°%] - r] 1 T . (2.18) 

The second term appearing in each case is unbiased. For computing general quanti¬ 
ties of interest, it is instructive to introduce the empirical measure of the multilevel 
ensemble {[vi(u)e ti )(u>e ti )]*i\}£ =0 , i.e. ^ 


.ML 


, M 0 


(wo,i) 


i =1 


M e 


M t ^ 
i —i »=l 


L 1 
+ 5Z W. 




(2.19) 


Then, the following shorthand notation for multilevel sample averages can be intro¬ 
duced. For any tp : —> R, let 


ML 


M := / 


(fdii: 


ML 


1 


m £ 


^=0 i=1 


:£-l 




2 Similar may be done for the predicting distributions, but the updated distributions will be our 
primary interest. 
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2.5. Nonlinear Kalman filtering. It will be useful to introduce the limiting 
process, in the case of nonlinear non-Gaussian forward model (2.1), i.e. nonlinear 
(2.3). The following nonlinear Markov process defines the mean-held EnKF [33] : 


Prediction 


Update ■ 


( V n+1 

= ^{V n ), 

< m n+ 1 

= E[u n+ i], 

{ c n+1 

= K[(v n+ i - m n+ 1 ) <g> (u„ + i 

*$n+ 1 = 

HC n+1 H J + r 

K n -\-l = 

Cn+i^S-l, 

Vn+1 = 

2/n+l + Vn+l 

^n+l — 

(I AA_i_i-H)u n -|_i T A n _|_i y n . 


( 2 . 20 ) 


( 2 . 21 ) 


Here {i7n}„=i are i.i.d. draws from iV(0, T). The expectations appearing above in 


(2.20) are with respect to the random variable v n +i, which depends upon the ran¬ 
domness from the initial condition vg = uo, the maps T, and fjo,... ,rj n . The observed 
value y obs is considered fixed and is not averaged over. It is easy to verify that in the 


linear Gaussian case of the Section [2T2] the mean and variance of the above process 
correspond to the mean and variance of the filtering distribution. Furthermore, it was 
shown in [381 S5 that the single level EnKF converges to the Kalman filtering dis¬ 
tribution with the standard rate 0(M ~ 1//2 ) in this case, as stated formally in (2.11). 
It was furthermore shown in 135] and |34j that for nonlinear Gaussian state-space 
models and fully non-Gaussian models respectively, the same convergence prop¬ 

erty holds , with the measure corresponding to v n in ( 2 . 20 ) and ( | 2 . 21 [ ) replacing p, n in 
(2.11), as long as the model satisfies a Lipschitz criterion as in Assumption [l] In this 
work, the aim is to show that the MLEnKF converges as well, and with a cost-to-e 
which is strictly smaller than its single level EnKF counterpart. The true filtering 
distribution of u n \(Y n = Y° hs ) will not appear in the remainder of this work, and the 
variable v n will correspond to the solution of the above system (noting that the two 
are equivalent in the linear Gaussian case). 

3. Theoretical Results. The approximation error and computational cost of 
approximating the true filtering distribution by MLEnKF when given a sequence of 
observations - ■ ■ ,y n will be studied in this section. The notation | ■ | will be 

used for standard Euclidean norm (and the induced matrix norm) and the covariance 
matrix of random variables Z,X G R d will be denoted 

Co v[Z,X] :=E[(Z-E[Z])(AT-E[X|) t ] , 

with the shorthand Cov[Z] = Co v[Z,Z\. Before stating the main approximation 
theorem, it will be useful to present the basic assumptions that will be used throughout 
and the corresponding standard MLMC approximation results for i.i.d. samples, as 
well as a slight variant which will be useful in what follows. 

Assumption 2. Consider the d-dimensional SDE (2.3) with initial data uq G 
U pe ^L p (Q). For the hierarchy of solution operators defined in Section Q let SSf* denote 
a numerical solver using a uniform time step Afi = 1/Ne with Ng/Ng-i > N > 1 for 
£ = 0,1,.... Let T denote the set of functions <p : —> R which, for all £ > 0 and 
all u, v G U P £N-k p (^); and a given set of constants > 0 with a > min(/3, 7 )/% 

fulfill 
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< 


N e a , and |E[</j(m) — tp(v) 


< 


N~ a 

ly L > 


provided 


(i) |E[^(^(u)) - ¥>(^(t>))] I 
\nu~ V }\<N^ a - 

(ii) ||v?(5' f ('u)) - <^(1 ,£-1 (i;))||p < iV7 /3/2 , for allp> 2 ; 

(iii) Ct := Cost (^(u)) < Nj ; 

where, as stated above, the notation f(M) < g(M) here is used to denote f(M) = 
0(g(M)). Assume further that all monomials of degree less than or equal to 2, are 
contained in T. 

Remark 2. An implication of the above condition (ii) is that condition (i) holds 
with a = p/2. However , for many numerical schemes, there are settings where it is 
possible to achieve rates a > (3/ 2 (implemetationally, this may yield savings in the 
computational cost). The literature J321 Theorem 14-5.2] and f 201 Chapter 7] provide 
sufficient regularity conditions on the SDE problem and for the the Euler-Maruyama 
method to achieve the rate exponents a = 1 and /3 = 1, and the Milstein method to 
achieve a = 1 and /3 = 2. 

We will now state the main theorem of this paper. It gives an upper bound for 
the computational cost of achieving a sought accuracy in L p -norm when using the 
MLEnKF method to approximate the expectation of an observable. The theorem 
may be considered an extension to the data assimilation setting of earlier “one-step” 
cost vs. error results in multilevel Monte Carlo, cf. m Theorem 3 . 1 ] and m Theorem 
1 ] . To reduce the number of repetitions in the below proofs we notice once and for all 
that the process itself is in L p by Assumption [l] hence the realization giving rise to the 
observations u n and the observations themselves y n are as well, for n = 1, 2, ..., N. 
It follows from this and the finite norm of K n pJSJ [MJ [35] that the elements v n and v n 


given by (2.201 and (2.21) are also in L p for n = 1,2,..., N. It will be assumed that 


the update comes at a marginal cost with respect to the prediction. This may be the 
case for complicated forward solution with small error tolerance, large ensemble, and 
comparably modest dimension d. 

Definition 3.1. A function —> R is said to be locally Lipschitz continuous 

with at most polynomial growth at infinity provided that there exist positive scalars 
v, Ctn < oo such that 


\ip(x) - <p(y)\ < C v \x-y\(l + \x\ v + \y\ v ), \/x,y £ 


(3.1) 


The notation f(M) ~ g(M) will be used to indicate that there exist constants 
ci, c -2 >0 such that c\g(M) < f(M) < c 2 < 7 (M). 

Theorem 3.2 (MLEnKF accuracy vs. cost). Suppose Assumptions [7] and^hold. 
For a given e > 0, let L and {Mf}^ =0 be defined under the constraints L ~ log(e _1 ) 
and 


' 

' Nf a , 

if 

P > 7 > 

/3 + 2t 

Nl 3 

L 2 Nt, 

if 

P = 7, 

1 


if 

P < 7 - 


(3.2) 


Then for all functions <p £ T that are locally Lipschitz continuous with at most poly¬ 
nomial growth at infinity, cf. Definition \3.1\ we have that 

\\pT(v)-Tnm P <\lo g (eWe, (3.3) 


where is the multilevel empirical measure defined in (2.19), where the samples 
are given by the multilevel predict (2.13) and update (2.16) formulae, approximating 
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the time t n = n mean-field EnKF distribution /j, n (the filtering distribution p n = 
N(m n ,C n ) in the linear Gaussian case). And the computational cost of the MLEnKF 
estimator over the time sequence satisfies 


fe 2 , if P>1, 

Cost (MLEnKF) < < e~ 2 |log(e) | 3 , if /3 = 7 , 

if /3< 7 . 


(3.4) 


Remark 3. The growth in error factor |log(e)|" in (3.31 is due to a propagation 
of perturbed observation errors of the MLEnKF estimator that has been conservatively 
bounded by the triangle inequality in (3.28). In our numerical tests we do however 
not observe the error growth factor, and therefore we believe it might be possible to 
eliminate this factor by sharper theoretical bounds. 

The proof of Theorem 3.2 follows roughly along the same lines as that of [35] . 
however with more notation and longer calculations due to the multilevel aspect. The 
proof also has connections to the work [B], in which an MLMC method is developed 
for estimation of higher order central moments. 

It will be convenient to introduce the mean-field limiting multilevel ensemble 
{[u^(o;^ i i),u^ _1 (w^ j ,:)]]'f\}^ =0 , [35] [34) 38], which evolves according to the same equa- 


Q, (uty 

tions with the same realizations of noise except the covariance C n , hence the Kalman 
gain K n , are given by limiting formulae in ( |2.20 ) and (2.21). That is, the intra¬ 
level pairs of ensemble members u£ -1 (u;^ i j)) are independent and identically 

distributed (i.i.d.) over index i, and they are independent between levels. An ensem¬ 
ble member Vjfi 1 (tJJi,i)) maps to (v£ +1 (w*,0> ^+ 1 (^, 1 )) as in ( |2.13[ ). Then 

(u^ +1 (wg i), urT* iA’t i)) is obtained as in (|2.16|, except with K n+1 from (|2.21|) replac¬ 
ing K™ L 1 in ( |2.16[ ). The noise realizations {u>e,i} are assumed to be the same as the 
EnKF ensemble member The sole difference is that the limiting 

ensemble is independent between levels and the pairs within a level are i.i.d. This 
is because the covariance and gain come from the infinite limiting system (2.20) and 
(2.21). The only correlations are between and u^ _1 (w£ 7 ), due to the 

Hence there is no multiplicative propagation of correlations within a level or between 
levels. This crucial fact allows to (a) on the one hand extend standard multilevel 
theory for i.i.d. draws over multiple updates, and (b) on the other hand, establish 
the required proximity of the two multilevel ensembles particle-wise, based on conver¬ 
gence of the random gains to the deterministic ones K n . The latter will require 
the greatest effort and will dominate the proof by means technical lemmas. Note that 
11 v l n || p , ||p, \K n \ < oo, following from Assumptions[I] 

The first step is to bound the multilevel predicting covariance in terms of its 
constituents, the ensemble members. The gain is then bounded in terms of the co- 
variance, and ultimately the updated ensemble in terms of the predicting ensemble 
and the covariance. The rate appears only by virtue of the convergence of the i.i.d. 
ensemble covariance, and it is propagated forward by induction. Only the predicting 
covariance will be considered and hats will be omitted to avoid unnecessary notation. 

Recall the multilevel Kalman gain is defined as follows 


K™ L = C„H' + T) 


\-i 


where 


4 ML = 51 X kQkql, 

k= l;Afc >0 


(3.5) 
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for eigenpairs {A*,, q^} of C“ L . The following micro-lemma will be necessary to control 
the error in the gain. 

Lemma 3.3 (multilevel covariance approximation error). Let be given 


by (3.5). Then the following holds 


| C ML _ < |C ML _ C) 


(3.6) 


where \ ■ \ denotes the induced 2-norm for matrices. 

Proof. Notice that 

|C' n ML -C n ML |=niax { , ;Aj< o } {|A,|}. (3.7) 

Denote the associated eigenvector by w max (normalized to |u max | = !)■ Notice that 
for any A = A 7 , 


. .. \vfAu\ .. . 

\ A \ = I |2 = max fc |A fc | 


where Afc are the eigenvalues of A. Since C n > 0, one has that 

l, t (n ML — c in i — u T c ii — if 7 r ML n > ir ,ML — c 

l u maxt 0 n W/“max| — “ ma x u n u ma.x “max u n “max ^ I'-'n 


ML | 


□ 

The next step is to bound the gain error, which is done in the following lemma. 
Lemma 3.4 (Continuity of the gain in the covariance). There is a constant 
c n < oo, depending on |i4|,7min, and \K n H\ such that 

\K™ 7 -K n \ <c n \C™ 7 -C n \, (3.8) 


where 7 m i n > 0 is the smallest eigenvalue ofT. 

Proof. Recall that 

K n - K™ 7 = C n H 7 ((. HC n H 7 + T)" 1 - (HC™ 7 H 7 + T)" 1 ) 

+ (c n - c™ l )h t (hc™ l h t + r )-\ 


(3.9) 


where C^f 7 > 0 is defined in (2.14), and notice that 


(. HC n H 7 + T)" 1 - (HC™ l H t + T)" 1 

= ( HC n H 7 + T)- 7 H(C™ 7 - C n )H T (HC™ 7 H T + T) _1 . (3.10) 


So 


K n - K™ L = K n H(C ,- C n )H T (HC™ 7 H T + T)" 1 
+ (Cn ~ C™ 7 )H T (HC™ 7 H T + T)- 1 . 

Note that a" T (T + B)x > x^Tx > 7 m i n for all x £ M. d whenver B = B 7 > 0, and this 
implies that \(HC^ l7 H T + T) _1 | < l/7 m in- It follows by (3.6) that 


I H 


I Kn - K™ 7 I < -1—^(1 + 2\K n H\)\C n - C, 

7min 


ML | 


□ 
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It is worth noting that the multilevel gain error is bounded by the unmodified 
multilevel sample covariance error, following from Lemma |3.3[ so modification in 


(|3.5 ) will not affect the ultimate approximation error. 

Theorem 3.5. Suppose Ass ump tions [7] and\2\ hold. For any e > 0, let L and 


{M#}f_ 0 be defined as in Theorem 3.2. Then the following asymptotic inequality holds 


\\CT-C n \\ P <e+\\C^-C, 


"/ML 11 


(3.11) 


with a cost which satisfies (3.41. 


Proof. Let C% denote the predicting covariance of the final L th level limiting 
system at time n, in the sense that the forward map above is replaced by T L , but 
the gain comes from the continuum mean-field limiting system. Furthermore, let 
denote the covariance associated to the multilevel ensemble {(v^ The 

triangle inequality is used to split 


|CT - C n \ < I Cf - c n \ + |CT - Cf I + - C t 


=VML| 

y n 17 


(3.12) 


and each term will be dealt with in turn, in the following three lemmas. The proof 
of the theorem is done after establishing Lemmas |3.6[ |3.7| and |3.8[ which provide the 
asymptotic bound on the first two terms. □ 

Lemma 3.6. Suppose Assum ptio ns [7] andf2j hold. For any e > 0, let L and 


0 be defined as in Theorem 3.2. Then, for any finite n, 


max 


(|E[r 


and for l = 0,1,..., L and all p >2, 


.0 —P _1 I 

max I \\vz, — vt, 1 


Proof. Since the initial data v 0 = v(f = vq = vq is the same 


E 


Now assume that the following holds 


V n ] 

, E 



lip ’ 

v n - 

-L 


= v 0 

= V 0 

'^L 

1 

v o - 

- no 


v„ - v r 


^ i-i 


< 


<N, 


- 0/2 


(3.13) 


(3.14) 


= 0. 


max l 


E 


v n- 1 v n -1 


, \K[Vn-l - Un-l]|) < S 


Assumption [2] (i) directly implies 

|E[^ - v n ] | < e. 

Futhermore, since \K n H\ < oo for any finite n, 


E 


V„ - Vr, 


< 1 1- K n H | |E [v„ - u n ] | < e 


and inequality (3.13) follows by induction. 


To prove inequality (3.14), recall that due to the matching initial data, the in¬ 


equality holds trivially at n = 0. Assume 


max i iK_i - ll p > 


V -! - V„ 


<N, 


- 0/2 
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By Assumptions [I] (i) and [2] (ii), 


1*4 ~ v t Ip < 


p 


<N, 


-P/2 


^JL -1 


< 1/ - K.„H\ II vi - vt 1 II < N7 0/ \ 


and inequality (3.14) holds by induction. □ 

Lemma 3.7 (Covariance discretization error). Suppose Assumptions^ and^hold. 
For any £ > 0, let L be defined as in Theorem \S.2\ Then the following asymptotic 
inequality holds 


\Cit-C n \<e. 


(3.15) 


Proof. It is possible to show that for any symmetric matrix A, the following 
inequality holds 


|A|< £ |A«'|. 

jJ '=i 

Furthermore, by adding the terms ±E[(u n ) J "] E[(u^')- 7, ] ) 


(3.16) 


\(C^C n r'\= E UvWtfZ 


-E 


{v n y(v n y 


- e [i^y] e Uriy'] + e [{v n y] e Uv n y' 


(3.17) 


< 


+ 


E 


E 


( U n) J ( W n) y - (0 J K) y 


\<Y] M(Vn) j (Vn) j ) | + |E [(V n y] | E UvW - (v n f 




The last inequality follows by Lemma 3.6 and Assumption^ (i), noting that T contains 
all monomials of degree less than or equal to 2. 

□ 

Notice that 


C n =Y1 Cay \- V n] ~ Cov bn *.], 


i =0 


and 


C n L = CaV M t [vi] - CoVmA^u ^ 


t=o 
„-l _ 1 


with the convention that v 1 = v 1 := 0. Consider also the partner covariance to the 
above 


C n h = Gow mA v Ii\ ~ Co y Mt [vi A- 


1=0 
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The next two differences are bounded in terms of the single-level differences, using 
the triangle inequality to extend to the sum. 

Lemma 3.8 (multilevel i.i.d. sample covariance error). Suppose Ass umpt ions [7j 
ZioZd, and for any e > 0, let L and {Mg}f_ 0 be defined as in Theorem 
the following asymptotic inequality holds 


3.2 Then 


IIC-c, 


n IIP 


5, £- 


(3.18) 


Proof. Notice the following triangle inequality 
L 

- Cn I < E \ G ° V M e [v e n\ ~ CoVMAvt 1 ] ~ (Cov Mf [<] - Cov M Jt^ _1 ])|. 
£—0 


To avoid needlessly long terms when bounding the summands of the above equation, 
we now make the assumption in this proof that E = 0, without loss of generality. 
We may then obtain the rearrangement 


Cov Mf K] = Em, [vi(vi) T ] - E Ml [vi\(E Mt [vi]) 


and similarly for the i — 1 term. Using the identity acA — bfA = | [(a + h) (a — 6) T + (a — 
b) (a + 6) T ] for a, b S R d on each of the outer products with £,£ — 1, respectively, and 
then using (3.16) again for the first term, and Cauchy-Schwartz for the second (and 
grouping like terms arising from the (j,f) — > {j',j) symmetry of + a- 7 fr 7 )), 

one has 


Cov M , K] - Cov Mf K I - ( Cov K] - Cov[u n ]) 


j<T =i 


+ 


Em e [(i{ + v e n 
rf _ V 


(vi-vt-y]-E (vi+vt'y (vt-vty 


+1 E M y n + viY ] ]\ EmA« -] 


+ 


EMAivt+vPy'} i EMA«-<~y] 


(3.19) 


Almost sure convergence follows by the law of large numbers. The rate in L p is shown 
now. 

First, it will be necessary to recall the Marcinkiewicz-Zygmund inequality: for 
i.i.d. random variables X\, ..., Xn ~ X with ||X|| P < oo for p > 2, and E[X] = 0, 

||^iv[X]|| p <c p Ar- 1 / 2 ||X|| J) , (3.20) 


where the constant depends only on p, cf. (5JE23; in fact, c p < Sy/2p, cf. [ Hd] . 

Using the Marcinkiewicz-Zygmund inequality then Holder’s inequality on each of 
the first two terms on the right-hand side o f (|3. 19 ), then the reverse order on the last 
two, and finally the Assumptions [ 2 J Lemma 3.6 and the fact v\i £ L p (Vt) for all p > 2, 
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together yield 

d, 

l|Cov M J^] -CovmJ^ 1 ] - (Cov[t4] -Cov[^ _1 ])|| p < ^ Mfi 1/2 

3<3 '=1 

c p (ll« +vty h P \\ (v e n - rf~ i y ii 2p + ii (4 + v e ~ i y ii 2p ii {v e n - v e n r 1 )Y h P 


+ c 2 2p M t 1/2 (nK + < YiwiK-i^ TII2p + II« + v l n y'hpWivi-vi 1 ) J || 2 p 


(3.21) 


< M7 1/2 N7^ /2 . 


Finally, by the triangle inequality, the following bound holds for (3.18) for all 
P > 2, 


rr - C£ || P < £ M~ 1/2 N~ p/2 < e. 


e=o 


The previous two lemmas complete the proof of Theorem |3.5| Now we turn 


to the next term in (3.11), the difference between multilevel ensemble covariances, 
which is continuous in the individual ensemble members. First it will be necessary to 
recall (see e.g. Lemma 4.3 of [35]) that for identically distributed random variables 
Xi ,..., x N G R d , 


(e E n [|a 


l i/p 


1/9 


< X, 


— Il^nllr? 


(3.22) 


where r = max{g,p}. 

Lemma 3.9 (Continuity of multilevel sample covariances in particles). Suppose 
Assumptions [7] and[^ hold, and for any £ > 0, let L and {Me}f_ 0 be defined as in 
Theorem\3.2\ Then the following asymptotic inequality holds for all p > 2, 


IIC L - cTIIp < E (H< ~ <11 p + 4 H< - <ll2pll<l|2p 

1=0 V 

+ Wvt 1 -<" 1 ||p + 4 ll«n" 1 -< _1 ||2p||4 _1 ||2p^. 


(3.23) 


Proof. Recall first that 
L 

K ML - C™\ < I Cov mJ<] - CovmJu^ 1 ] - (Cov M J<] - Cov Mf [*4 _1 ])| ■ 

1=1 

Now the individual terms will be bounded. Note that 

Cov M Jt/] = E Me [v e n (v e n ) T ] - E Me [v e n ](EM e [v e n ]) T , 

and similar for v!/. Using a 2 — b 2 = (a — b) 2 + 2b(a — b) with a = uJv^ and b = ufv^ and 
again with a = (1/Me) ^ v n,i and & = (1/Me) uT ^n,i f° r arbitrary u G R d , 
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these terms are rearranged as follows 

vJ (CoVMf [4] - Cov M e [Vn]) u 


= EmAW (4 - 4)1 ] + 2 EmAW 4)4 (4 4))] 

- I E Mt [4(4 - 4)] I 2 - 2 E Me [u T v e n }E Me [4(4 - 4)]- 


Then, using the Cauchy-Schwartz inequality, the first term of (3.12| is bounded as 
follows 


|Co V M ,[4] - Co V M ,[4 x ] - (Cov M j4] - Cov M e [vi x ])| 

= I Co v M , [4] - Co + Cov Mf [?( _1 ] - Cov M<l [4 _1 ]| 

< E Ml [ |4 - <| 2 ] + 4y[ !4| 2 ]-Em, [ 14 - 4| 2 ] 

+ [ K _1 - < _1 | 2 ] + EM e [ (4" 1 fjEMAV't 1 - 4” 1 r ]. (3.24) 

After rearrangement, the triangle inequality, (3.22) with p = max{p, 2}, and Holder’s 
inequality complete the proof. 

□ 

It has just been shown that the second term of (3.11) is “close in the predict¬ 
ing ensembles”. Therefore, the error level of the first term will carry over between 
observation times by induction. This is made rigorous by the next lemma. 

Lemma 3.10 (Distance between ensembles.). < Suppose As sum ptions^ an d^hold, 
and for any e > 0, let L and {Me}g =0 be defined as in Theorem 3.2. Then the following 
asymptotic inequality holds for all p > 2, 


E 

e=o 


< |log(e)| n e. 


(3.25) 


Proof. First recall that the assertion holds trivially for n = 0. Proceeding by 
induction, assume for p > 2, 


E| 

e=o 


v n -1 - v 


n—1 


< |log(e)|” 1 e, 


Then Assumption |T](i) implies the following inequality holds for the prediction 


E 


V„ - v„ 


< c^e 


V„, - v„ 


< |log(e)r 1 £■ 


(3.26) 


e=o f=o 

Using Lemma [3~~fj the following inequalities hold for l = 0,..., L, 

l4 - V n \ <\I- K n H\\vi - 41 + c„|C L - C n |(|4 - 41 + 14 - Hvi |). (3.27) 
By Holder’s inequality and since 4,4 e L p (fl) for all p> 2, 

114 - 4||„ < |J -K n H\\\v e n -4|| p 


+ cjcf L - c n 14(114 - 414 +114 - #414) 
< 114 - 44 + II cr - 4|| 2 p (n4 - 4i| 2p + 1 ). 
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Plugging the moment bound (3.26) into the right-hand side of the inequ ality (3.23) 


yields that ||C'„ L — L || 2y > < |log(£)|" e, which in combination Theorem 


leads to \\C™ h -C n \\ 2 p < |log(e)| n 1 
again for p, 2 p 


3.5 


further 


e. Therefore, summing the above and using (3.261 


L 

Ei 

f=0 


Vr, - V. 


L 

£=0 


I vi - vi 


|2 p 


IVn-Vnhp) 


< |log(e)|" + 

e=o 

< |log(e)|"e, 


(3.28) 


|2p , 


where the last inequality of the proof uses that ||— Hv„ || 2 P < 1 and L ~ |log(e)|. 

□ 

Induction is complete on the distance between the multilevel ensemble and its 
i.i.d. shadow in L p , and it remains only to close the argument, which is done next. 
Note that the induction actually holds for all n, but we are able to neglect the n- 


clependence of the constant c n appearing in (3.27) by considering only a finite number 
N of steps. 


Proof. [Proof of Theorem 3.2 


What remains is to verify that provided L and Mi are defined under the con¬ 


straints in Theorem 3.2 the error bound (3.3) will be obtained for all the functions 


ip £ T which are locally Lipschitz continuous with at most polynomial growth at 
infinity, cf. Definition |3.1| Notice that the triangle inequality gives 

M\\ P < WpTM - ^ L MIIp 


+ [|/C» - HMWp + HUM - MnMIU 


(3.29) 


where denotes the empirical measure associated to the i.i.d. ensemble, and p,% 
denotes the probability measure associated to v L . Before treating each term sepa¬ 
rately, we notice that the two first summands of the right-hand side of the inequality 
relates to the statistical error, whereas the last relates to the bias. 


The first summand of (3.29) satisfies the following bound 


WpT(‘P)-pT(‘P)\\p = 


E Em I \^n) - *) - - p(v e n *)) 


e=o 


L 




< > ' IIV’(^) - ^n)\\ P + Wvffin X ) - 


k/-1. 


t=0 

£=0 


\V„ - I’ 


^i-1: ,^e-i 

K - v n | [1 + \v n 


ir) 

)- 


+ K - v. 


i^+ii 


,~e- 1 ~e-i 
\vi - v„ 


v+l\ 


< C 


Lj 

E[i 

£—0 


v„ - v„ 


|2p 




(l + ll V nll2 p! /) 
)- 


+ \\vi-v. 


\v+l 


n '■n\\ p (v+l) 


(l + ll€ _1 | 


ie i -c _1 ii^ i + D 


< |log(e)| n £. 


(3.30) 
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The second inequality follows directly from the expression (3.1). The third inequality 


comes from the triangle ine qualit y and Holder’s inequality, and the fourth inequality 
arises directly from Lemma 3.10 and the boundedness of v n in L p for p > 2. 

For the second summand of (3.29|), notice that we can write Tn~f4r\ 


where pf n is the measure associated to the level i limiting process v ( . Then, by virtue 
of (3.201 and condition (ii) of Assumption [2} 


-ML 


(^)-^ll P <£||^ <P{K)~<P(K ) - E [^J - <P(Vn )] 

L 


r 1 / 2 || V(Vn) - V{Vn ^ 


£=0 

L 


(3.31) 


<cJ2 M > 


—1/21.^~€ 

- v„ 


e=o 


< 


e=o 


- 1/2 N7 0/2 


< s. 


Finally, for the bias term, 

\\Pniv) - Pn{<P)\\P = \P-niv) ~ Tn{v)\ = 


E 


<P(v n ) ~ V(Vn) 


5, e, 


(3.32) 


where the last inequality follows from Lemma 3.6 and Assumption [2] (i). 

Putting together (|3.30|), (|3.31|), and (|3.32) in (|3.29|) yields the sought bound 
in (pL3|). □ 


Theorem |3.2| shows the cost-to-e performance of MLEnKF, and to verify that it 
generally outperforms EnKF in this performance measure, we end this section with a 
comparable result on the cost-to-£ perfomance of EnKF. 

Theorem 3.11 (EnKF accuracy vs. cost). Suppose Assumption [7J Assump¬ 
tion [1] (i), and Assumption ^ ] (in) hold. For a given e > 0, let L and M be defined 
under the constraints L ~ log(e _1 )/a and M ~ e~ 2 . Then for all functions ip £ T that 
are locally Lipschitz continuous with at most polynomial growth at infinity, cf. Defi- 
nitionW .71 we have for any p > 2, 


\\^(T)~Tn(T)\\ P <s. 


(3.33) 


Here denotes the EnKF empirical measure defined in (2.10), where the samples 


are given by the EnKF predict formulae at resolution level L (i.e., with the numerical 
integrator A> L ), approximating the time t n = n mean-field EnKF distribution p n . The 
computational cost of the EnKF estimator over the time sequence satisfies 


Cost (EnKF) < £ -( 2 +t/«). 


(3.34) 


Proof. [Sketch of proof] By the triangle inequality 

II PnW) - P-n C (<P)\\P < \\Pn(<P) ~ Tniv)\\ p ) + \\p L (<P) ~ hC G iv) 

+ || fC c (<p) - p™ c (v)\\ p =--i + n + HI, 
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where denotes the empricial measure associated to an EnKF ensemble {v n (uj i )}f£ 1 

and fin denotes the emprical measure associated to v n . We bound the terms I, II, 
and III individually. 

For the first term, we have 




5, e, 


I = | Mn(¥>) - Mn(<P) | < E <P(Vn) ~ 
where the last inequality is implied by inequality (13.13 1 of Lemma 3.6 which it is 


y (|3l3 

'ffiT 


straightforward to verify holds under Assumption I 

For the second term, we first note that we may assume without loss of generality 
that <^(0) = 0. Since <p is locally Lipschitz continuous with at most polynomial growth 
at infinity, there then exists positive scalars v, C v such that 

\v(?)\ < C v { 1 + \x\ v ). 

By inequality (3.20) and since v% G L p (fl) for any p > 1, 


II < 


e m[<p(v^)\ - E [ v ?( w ) 


< M~ 1/2 




< Ge 


1 + 


5, s. 


For the last term, let us first assume that for any p > 2 and finite n, 


vh-v n 




(3.35) 


for the particle dynamics v^ and v„ respectively associated to the EnKF ensemble 
an( i the mean-field EnKF ensemble {v n i }^ =1 . Then the assumed regularity 
of <p, that Vn,^. G L p (fl) for all p > 2, and Holder’s inequality yield that 


III = 


< 




e m[v(vJ - P(«J] 


< C v 


vit - vt 


( J + \vh[ 


V„ - V„ 


< £• 


All that remains is to verify (3.35). Since this can be done by very similar steps as in 


the proof of inequality (3.251, we omit this verification. 

□ 

Remark 4. Notice that for a given n one can obtain an error 0(e) for MLEnKF 
in (3.3) for an additional cost which is given by replacing e by e|log£| _n in ( |3.4[ ). 
Furthermore, it is worth noting that, for any m and for any 6 > 0, | log£| m = 0{e~ a ). 


Hence one can obtain a cost-of-error rate in (3.4) which is uniform in time and 


asymptotically superior to EnKF (3.34). 


4. Numerical Examples. In this section the performance of EnKF and MLEnKF 
are compared on some very simple numerical examples in terms of computational cost 
vs. approximation error. First, in section |4~lj underlying dynamics from an Ornstein- 
Uhlenbeck SDE is considered. Next, in section [4~2| the underlying dynamics geomet¬ 
ric Brownian motion is considered. Both of these examples are indeed analytically 
tractable, however they are approximated as though they were not. This provides a 
solid benchmark to compute errors and allows the theory to be illustrated. 
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4.1. An Ornstein-Uhlenbeck SDE. We first consider the simple Ornstein- 
Uhlenbeck SDE problem 


du = — udt + <rdW t , u(0) = 1. 


(4.1) 


It has the exact solution 


and since 


u(t) = u( 0)e 4 + / ae^ s ^dW s , 

Jo 

[ ae^dWs- iv(o,y(l-e- 2 )), 

j 0 V " V 


=:£ 

one SDE realization sampled at the observation times t n = n is generated by the 
linear solution operator 

U n +1 = e^Hn + =: ^(u n ) 

where £„ ~ iV(0, £) i.i.d. The corresponding noisy observations are given by 

y° hs = u n + rj n , 

with r] n ~ A(0,T) i.i.d. 

For the MLEnKF algorithm, a hierarchy of Milstein solution operators 
are introduced, where the 7 th level solution operator uses a uniform time-step of size 
Af 4 = A numerical integration step takes the form 


<rn+1 = <m(l - At 4 ) + aA< )TO , m = 0,1 ,..., 2 4+1 - 1, 
where the initial condition is given by v l n 0 = t^_ l5 

A Wi m = W(t n + (m + l)Af 4 ) - W(t n + roAf 4 ) ~ A(0, At 4 ), 


(4.2) 


and zt£ = 


; , 2 <+ 1 ’ 


Moreover, since the solution operator for (4.1) is linear, the gold standard becomes 
the conventional Kalman filter update 


(ml Si) = ((I - K n H)m n + K n y° hs , (7 - K n H)C n ), 


n= 1,2 ,.... 


Problem parameters. In the numerical experiments, N = 100, 200 and 400 
observation times {t n = n}l =1 are used, and the covariance parameters are set to 
r = 0.04 and a = 0.5. For a prescribed computational cost O (J), an EnKF ensemble 
of size M = O (J 2 / 3 ) is solved by the Milstein method on a mesh At = O (J 1 / 3 ), and 
for the MLEnKF method, we set L and Mg according to the constraint in Theorem|3.2[ 


Approximations of the mean and covariance. In our first numerical experi¬ 
ment we approximate the gold standard mean and covariance for a single observation 
realization using the respective ensemble Kalman filtering methods, and measure the 
approximation error in terms of the root mean square error (RAISE): 


\ 71=1 


N ,-f ™ 

m'n - TO„ 


N 


N iAt _ r< 

\y - |C y n c.' n 


\ ^ 

\ n—1 


N 


(4.3) 
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with ( fh n ,C n ) denoting a single realization of either the EnKF or the MLEnKF up¬ 
dates approximating the gold standard moments. These observables are sufficiently 
smooth to reach the rates a = 1 and /3 = 2 with the Milstein method, cf. f2U]. 
The respective decay rates are numerically verified over a sequence of times in Fig¬ 
ure 4.1 Figure 4.2 presents a numerical performance study measuring RMSE (4.31 


vs. computational cost for the respective methods. As is to be expected from Theo¬ 
rem |3.2| the decay of RMSE for the MLEnKF method as a function of the cost J is 
roughly 0( J -1 / 2 ), orders of magnitude faster than the observed and expected EnKF 
decay rate 0{J ~ 1 - /3 ). Note that the error growth factor |log(e)|' 1 from the theoretical 
bound (3.3) is not visible in the experiments. In fact, the constant is even stable 
(the shift in cost as measured by runtime is simply due to computation of additional 
updates), indicating that with a more careful analysis the present results may be 
extended to an infinite time horizon. 






Figure 4.1. N umer ical estimates of the decay rates over a sequence of times for the problem 
presented in Section \f.l\ with y>(v) = v. The computations use M = 10 6 particles on every level. 


Approximations of the excedence probability. In our second numerical test, 
we approximate the mean of the observable (p(u n ) := 1 {u n > 0.1}, which corresponds 
to the excedence probability P(w n > 0.1) = 1 — >I>((0.1 — frV n )/\/ Cn). The Milstein 
method achieves the weak rate a = 1, but while one may show for p = 2 and any S > 0, 


||^(u))^^- 1 (u))|| p <lV, 


(l-*)/2 


cf- m E )1 the low regularity of the observable 


implies that there does not exist a (3 > 0 fulfilling condition (ii) of Assumption [2] for 
all p > 2. A numerical infere nce of (3 = 0 can be made from the numerical estimates 
of the decay rates in Figure 4.3 where we see that the decay rate of ||(r 1 )) — 
(/?(4' f_1 (u))|| p consistently decreases towards 0 as p increases over a sequence of times 
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Figure 4 . 2 . Comparison of the accuracy vs. computational cost when using the EnKF and 
MLEnKF met hods on the filtering problem presented in Section \4.1\ The error is measured in terms 
of the RMSE ( | 4 . 3 | > for the mean (top row) and covariance (bottom row), computed with N = 100 , 200 
and 400 observation times in the first, second and third column, respectively. The computational 
cost is measured in computer runtime. 


(while ql ~ 1). Theorem 3.2 does therefore not cover the given approximation problem. 
Nonetheless, implementing with the rates (3 = 1 and a = 1, a numerical comparison 
of the performance of EnKF and MLEnKF approximating the excedence probability 
is presented in Figure |4.6[ A near optimal RMSE decay rate, slightly slower than 
is again achieved for the MLEnKF method. 


4.2. Drift-alternating Geometric Brownian Motion. We next consider the 
SDE 


du(t + n) 


a 2 u(t + n)dt + au(t + n)dW(t + n), if n is even, 
au(t + n)dW(t + n), else, 


for t G (0,1), 


( 1 - 1 .) 

and with the initial condition u(0) = 1. This equation is analytically tractable as 
well, and the solution of the transformed equation 2 = log u is given via Ito’s formula 
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Figure 4.3. Numerical estimates of the decay rates over a sequence of times for the problem 
presented in Section \4-l\ when approximating the excedence probability P (u n >0), i.e., with ip(v) = 
l{v > 0.1}. The computations use M = 10 6 particles on every level. 


W 

m 

s 



Figure 4.4. Accuracy vs. computational cost comparison of the EnKF and MLEnKF meth¬ 
ods on the filtering problem presented in Section \^.1\ when approximating the excedence probability 
P(itn > 0.1) over N = 200 observation times. The error is measured in terms of the RMSE and the 
computational cost in computer runtime. 


by 

2 

dz{t + n) = (—1 ) n —dt + adW(t + n). 

Defining ~ 7V(0 ,ct 2 ) i.i.d., one has that 

(j2 

z n +1 = Z n + (-1)" Y + in ='■ Vn(z „), with Z 0 = log U 0 = 0, 
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and the solution of (4.4) can be obtained via exponentiation: u r , 
noisy observations for u n are introduced on the form 


e 2 ". Moreover, 


Vn — u n c^ , 


and rj n ~ N(0,T) i.i.d. Which, upon defining y° bs = logy n , yields the following 
relation to noisy observations of z n : 


y 


obs 

n 


Zn +Vn- 


As the SDE (4.4) does not fulfill the linear Gaussian constraints (2.6) but z = logu 
does, we will here update the ensemble oi z = logu processes. However, to add 
some artificial difficulty to the problem, the numerical integration is done on the u 
ensemble: 


(i) 

(ii) 

(hi) 

(iv) 


Numerically integrate a (multilevel or single level) ensemble u n _i —► u n 
Compute sample mean and covariance of z n \Y£fi\ using the z n = logit, 
semble. 

Update the ensemble z„ 


en- 


U 


obs 


by the new information provided by the observation 


Compute the initial condition for the ensemble u n = e 2 " and return to (i). 

Remark 5. The numerical integration of the GBM process in step (i) above 
introduces an artificial difficulty in the filtering problem since the integration may by 
other means be solved exactly. In practice, this does of course not make sense, but 
our purpose here is simply to numerically validate the performance of the MLEnKF 
method on a set of simple filtering problems for which reference solutions exist. 

Numerical integration of u n is done by the hierarchy of Euler-Maruyama schemes 
introduced in ( |4.2| ) (applied to the GBM problem, the schemes are Euler-Maruayama, 
while applied to problems with additive noise, the schemes are Milstein), here with 
the slightly finer mesh hierarchy A t e = 2~ 3+( , since the problem less stable. The 
covariance parameters are set to a = 1/4 and T = 1/16, N = 200 and the numerical 
method yields the rates a = 1, fi = 1 (and 7 = 1). See Figure 4.5 for a numerical 
verfication of these decay rates over a sequence of times. In Figure |4.6[ the gold 
standard mean and covariance of z n has been approximated by the filtering methods. 
We observe an RMSE decay rate slightly slower than 0(s -1 / 2 ) for MLEnKF and 
©(s -1 / 3 ) for EnKF, where s denotes runtime in seconds. 


5. Conclusion. A first attempt, to the knowledge of the authors, at filtering 
using a multilevel Monte Carlo approach is considered in the present work. A proof 
based on induction of the optimality of the cost as a function of the error, or equiv¬ 
alently the error as a function of the cost, is given. This shows that an optimality 
result, which is slightly penalized with respect to the vanilla Monte Carlo result, can 
extend to the case of sequential inference. There is either a logarithmic term which 
grows with the number of steps, or a slightly higher rate 0(e~ 2 ~ s ), for 5 > 0. The 
ensemble Kalman filter is considered, which is consistent only in the case of a linear 
Gaussian model. However, the mean-field limiting equation may be viewed as a one- 
step optimal linear (in the observation) filter, and the convergence to this limiting 
distribution has the desired rate for a more general class of models. Since this work 
was submitted, two papers have appeared which consider consistent nonlinear filters 
for similar models. The paper [27] considered multilevel particle filters with optimally 
coupled multinomial resampling and found the rate of strong convergence /3 is effec¬ 
tively reduced by a factor of 2 as a result of the resampling. The paper [2T] considered 
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Figure 4.5. N umer ical estimates of the decay rates over a sequence of times for the problem 
presented in Section \ 4. 2\ with tp(v) = v. The computations use M = 10 6 particles on every level. 



Figure 4.6. Accuracy vs. computational cost comparison of the EnKF and MLEnKF methods 
on the filtering problem presented in Section \f.2\ The error is measured in terms of the RMSE ( |4.3[ ) 
for the mean (left plot) and covariance (right plot), and the computational cost is measured in 
computer runtime. 


multilevel ensemble transform particle filters, which use an optimally coupled deter¬ 
ministic transformation in place of the standard random resampling mechanism, and 
numerical results indicated the rate may be reduced in some cases and the same in 
others. 
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